AITopics | provably safe reinforcement learning

Collaborating Authors

provably safe reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Provably Safe Reinforcement Learning with Step-wise Violation Constraints

Neural Information Processing SystemsDec-26-2025, 13:01:18 GMT

We investigate a novel safe reinforcement learning problem with step-wise violation constraints. Our problem differs from existing works in that we focus on stricter step-wise violation constraints and do not assume the existence of safe actions, making our formulation more suitable for safety-critical applications that need to ensure safety in all decision steps but may not always possess safe actions, e.g., robot control and autonomous driving.We propose an efficient algorithm SUCBVI, which guarantees $\widetilde{\mathcal{O}}(\sqrt{ST})$ or gap-dependent $\widetilde{\mathcal{O}}(S/\mathcal{C}_{\mathrm{gap}} + S^2AH^2)$ step-wise violation and $\widetilde{\mathcal{O}}(\sqrt{H^3SAT})$ regret. Lower bounds are provided to validate the optimality in both violation and regret performance with respect to the number of states $S$ and the total number of steps $T$.

mathcal, provably safe reinforcement learning, widetilde, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (0.96)
Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback

Dynamic Model Predictive Shielding for Provably Safe Reinforcement Learning

Neural Information Processing SystemsMay-27-2025, 13:38:16 GMT

Among approaches for provably safe reinforcement learning, Model Predictive Shielding (MPS) has proven effective at complex tasks in continuous, high-dimensional state spaces, by leveraging a backup policy to ensure safety when the learned policy attempts to take risky actions. However, while MPS can ensure safety both during and after training, it often hinders task progress due to the conservative and task-oblivious nature of backup policies.This paper introduces Dynamic Model Predictive Shielding (DMPS), which optimizes reinforcement learning objectives while maintaining provable safety. DMPS employs a local planner to dynamically select safe recovery actions that maximize both short-term progress as well as long-term rewards. Crucially, the planner and the neural policy play a synergistic role in DMPS. When planning recovery actions for ensuring safety, the planner utilizes the neural policy to estimate long-term rewards, allowing it to observe beyond its short-term planning horizon.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Provably Safe Reinforcement Learning with Step-wise Violation Constraints

Neural Information Processing SystemsJan-19-2025, 18:35:55 GMT

We investigate a novel safe reinforcement learning problem with step-wise violation constraints. Our problem differs from existing works in that we focus on stricter step-wise violation constraints and do not assume the existence of safe actions, making our formulation more suitable for safety-critical applications that need to ensure safety in all decision steps but may not always possess safe actions, e.g., robot control and autonomous driving.We propose an efficient algorithm SUCBVI, which guarantees \widetilde{\mathcal{O}}(\sqrt{ST}) or gap-dependent \widetilde{\mathcal{O}}(S/\mathcal{C}_{\mathrm{gap}} S 2AH 2) step-wise violation and \widetilde{\mathcal{O}}(\sqrt{H 3SAT}) regret. Lower bounds are provided to validate the optimality in both violation and regret performance with respect to the number of states S and the total number of steps T . Experimental results demonstrate the superiority of our algorithms in safety performance and corroborate our theoretical results.

mathcal, provably safe reinforcement learning, widetilde, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback